Overview

Dataset statistics

Number of variables36
Number of observations1000000
Missing cells0
Missing cells (%)0.0%
Duplicate rows5
Duplicate rows (%)< 0.1%
Total size in memory282.3 MiB
Average record size in memory296.0 B

Variable types

Numeric6
Categorical30

Alerts

Dataset has 5 (< 0.1%) duplicate rowsDuplicates
IN_TREINEIRO is highly overall correlated with TP_FAIXA_ETARIA and 1 other fieldsHigh correlation
Q002 is highly overall correlated with Q004High correlation
Q003 is highly overall correlated with Q004High correlation
Q004 is highly overall correlated with Q002 and 1 other fieldsHigh correlation
TP_ANO_CONCLUIU is highly overall correlated with TP_FAIXA_ETARIAHigh correlation
TP_ESCOLA is highly overall correlated with TP_ST_CONCLUSAOHigh correlation
TP_FAIXA_ETARIA is highly overall correlated with IN_TREINEIRO and 2 other fieldsHigh correlation
TP_ST_CONCLUSAO is highly overall correlated with IN_TREINEIRO and 2 other fieldsHigh correlation
TP_ESTADO_CIVIL is highly imbalanced (78.4%)Imbalance
TP_NACIONALIDADE is highly imbalanced (92.4%)Imbalance
Q007 is highly imbalanced (63.9%)Imbalance
Q009 is highly imbalanced (68.9%)Imbalance
Q012 is highly imbalanced (72.7%)Imbalance
Q015 is highly imbalanced (61.5%)Imbalance
Q017 is highly imbalanced (84.2%)Imbalance
Q022 is highly imbalanced (54.7%)Imbalance
Q025 is highly imbalanced (60.9%)Imbalance
TP_COR_RACA has 15812 (1.6%) zerosZeros
TP_ANO_CONCLUIU has 620104 (62.0%) zerosZeros

Reproduction

Analysis started2024-04-15 03:28:05.601523
Analysis finished2024-04-15 03:29:20.096321
Duration1 minute and 14.49 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

TP_FAIXA_ETARIA
Real number (ℝ)

HIGH CORRELATION 

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.221059
Minimum1
Maximum20
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.3 MiB
2024-04-15T00:29:20.149369image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile12
Maximum20
Range19
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.3507723
Coefficient of variation (CV)0.79382267
Kurtosis2.2593399
Mean4.221059
Median Absolute Deviation (MAD)1
Skewness1.6851032
Sum4221059
Variance11.227675
MonotonicityNot monotonic
2024-04-15T00:29:20.252463image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
3 249816
25.0%
2 247484
24.7%
4 113460
11.3%
1 110210
11.0%
5 63935
 
6.4%
6 40491
 
4.0%
11 37393
 
3.7%
7 28496
 
2.8%
8 21052
 
2.1%
12 20199
 
2.0%
Other values (10) 67464
 
6.7%
ValueCountFrequency (%)
1 110210
11.0%
2 247484
24.7%
3 249816
25.0%
4 113460
11.3%
5 63935
 
6.4%
6 40491
 
4.0%
7 28496
 
2.8%
8 21052
 
2.1%
9 15538
 
1.6%
10 12728
 
1.3%
ValueCountFrequency (%)
20 138
 
< 0.1%
19 347
 
< 0.1%
18 876
 
0.1%
17 2203
 
0.2%
16 3987
 
0.4%
15 6529
 
0.7%
14 10418
 
1.0%
13 14700
 
1.5%
12 20199
2.0%
11 37393
3.7%

TP_SEXO
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
0
612383 
1
387617 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 612383
61.2%
1 387617
38.8%

Length

2024-04-15T00:29:20.360562image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:20.441140image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
0 612383
61.2%
1 387617
38.8%

Most occurring characters

ValueCountFrequency (%)
0 612383
61.2%
1 387617
38.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 612383
61.2%
1 387617
38.8%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 612383
61.2%
1 387617
38.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 612383
61.2%
1 387617
38.8%

TP_ESTADO_CIVIL
Categorical

IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
1
924583 
2
 
33937
0
 
29197
3
 
11554
4
 
729

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 924583
92.5%
2 33937
 
3.4%
0 29197
 
2.9%
3 11554
 
1.2%
4 729
 
0.1%

Length

2024-04-15T00:29:20.533225image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:20.623306image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
1 924583
92.5%
2 33937
 
3.4%
0 29197
 
2.9%
3 11554
 
1.2%
4 729
 
0.1%

Most occurring characters

ValueCountFrequency (%)
1 924583
92.5%
2 33937
 
3.4%
0 29197
 
2.9%
3 11554
 
1.2%
4 729
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 924583
92.5%
2 33937
 
3.4%
0 29197
 
2.9%
3 11554
 
1.2%
4 729
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 924583
92.5%
2 33937
 
3.4%
0 29197
 
2.9%
3 11554
 
1.2%
4 729
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 924583
92.5%
2 33937
 
3.4%
0 29197
 
2.9%
3 11554
 
1.2%
4 729
 
0.1%

TP_COR_RACA
Real number (ℝ)

ZEROS 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.977699
Minimum0
Maximum5
Zeros15812
Zeros (%)1.6%
Negative0
Negative (%)0.0%
Memory size15.3 MiB
2024-04-15T00:29:20.709385image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q33
95-th percentile3
Maximum5
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.0174285
Coefficient of variation (CV)0.51445062
Kurtosis-1.3206045
Mean1.977699
Median Absolute Deviation (MAD)1
Skewness0.16764677
Sum1977699
Variance1.0351607
MonotonicityNot monotonic
2024-04-15T00:29:20.800468image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
1 448821
44.9%
3 406770
40.7%
2 105255
 
10.5%
4 18652
 
1.9%
0 15812
 
1.6%
5 4690
 
0.5%
ValueCountFrequency (%)
0 15812
 
1.6%
1 448821
44.9%
2 105255
 
10.5%
3 406770
40.7%
4 18652
 
1.9%
5 4690
 
0.5%
ValueCountFrequency (%)
5 4690
 
0.5%
4 18652
 
1.9%
3 406770
40.7%
2 105255
 
10.5%
1 448821
44.9%
0 15812
 
1.6%

TP_NACIONALIDADE
Categorical

IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
1
977228 
2
 
18778
4
 
2195
3
 
1509
0
 
290

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 977228
97.7%
2 18778
 
1.9%
4 2195
 
0.2%
3 1509
 
0.2%
0 290
 
< 0.1%

Length

2024-04-15T00:29:20.902562image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:20.998152image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
1 977228
97.7%
2 18778
 
1.9%
4 2195
 
0.2%
3 1509
 
0.2%
0 290
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
1 977228
97.7%
2 18778
 
1.9%
4 2195
 
0.2%
3 1509
 
0.2%
0 290
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 977228
97.7%
2 18778
 
1.9%
4 2195
 
0.2%
3 1509
 
0.2%
0 290
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 977228
97.7%
2 18778
 
1.9%
4 2195
 
0.2%
3 1509
 
0.2%
0 290
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 977228
97.7%
2 18778
 
1.9%
4 2195
 
0.2%
3 1509
 
0.2%
0 290
 
< 0.1%

TP_ST_CONCLUSAO
Categorical

HIGH CORRELATION 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
1
412561 
2
400724 
3
183679 
4
 
3036

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1 412561
41.3%
2 400724
40.1%
3 183679
18.4%
4 3036
 
0.3%

Length

2024-04-15T00:29:21.090236image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:21.171309image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
1 412561
41.3%
2 400724
40.1%
3 183679
18.4%
4 3036
 
0.3%

Most occurring characters

ValueCountFrequency (%)
1 412561
41.3%
2 400724
40.1%
3 183679
18.4%
4 3036
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 412561
41.3%
2 400724
40.1%
3 183679
18.4%
4 3036
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 412561
41.3%
2 400724
40.1%
3 183679
18.4%
4 3036
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 412561
41.3%
2 400724
40.1%
3 183679
18.4%
4 3036
 
0.3%

TP_ANO_CONCLUIU
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct17
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.750586
Minimum0
Maximum16
Zeros620104
Zeros (%)62.0%
Negative0
Negative (%)0.0%
Memory size15.3 MiB
2024-04-15T00:29:21.254386image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32
95-th percentile11
Maximum16
Range16
Interquartile range (IQR)2

Descriptive statistics

Standard deviation3.6401167
Coefficient of variation (CV)2.0793704
Kurtosis6.8120528
Mean1.750586
Median Absolute Deviation (MAD)0
Skewness2.6983524
Sum1750586
Variance13.25045
MonotonicityNot monotonic
2024-04-15T00:29:21.349472image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
0 620104
62.0%
1 127661
 
12.8%
2 56276
 
5.6%
3 42488
 
4.2%
16 30840
 
3.1%
4 28319
 
2.8%
5 21150
 
2.1%
6 15082
 
1.5%
7 11896
 
1.2%
8 9659
 
1.0%
Other values (7) 36525
 
3.7%
ValueCountFrequency (%)
0 620104
62.0%
1 127661
 
12.8%
2 56276
 
5.6%
3 42488
 
4.2%
4 28319
 
2.8%
5 21150
 
2.1%
6 15082
 
1.5%
7 11896
 
1.2%
8 9659
 
1.0%
9 7648
 
0.8%
ValueCountFrequency (%)
16 30840
3.1%
15 3540
 
0.4%
14 3843
 
0.4%
13 4502
 
0.5%
12 4942
 
0.5%
11 5343
 
0.5%
10 6707
 
0.7%
9 7648
 
0.8%
8 9659
 
1.0%
7 11896
 
1.2%

TP_ESCOLA
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
1
599276 
2
312405 
3
88319 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1 599276
59.9%
2 312405
31.2%
3 88319
 
8.8%

Length

2024-04-15T00:29:21.452071image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:21.533144image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
1 599276
59.9%
2 312405
31.2%
3 88319
 
8.8%

Most occurring characters

ValueCountFrequency (%)
1 599276
59.9%
2 312405
31.2%
3 88319
 
8.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 599276
59.9%
2 312405
31.2%
3 88319
 
8.8%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 599276
59.9%
2 312405
31.2%
3 88319
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 599276
59.9%
2 312405
31.2%
3 88319
 
8.8%

IN_TREINEIRO
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
0
816321 
1
183679 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 816321
81.6%
1 183679
 
18.4%

Length

2024-04-15T00:29:21.631233image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:21.717311image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
0 816321
81.6%
1 183679
 
18.4%

Most occurring characters

ValueCountFrequency (%)
0 816321
81.6%
1 183679
 
18.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 816321
81.6%
1 183679
 
18.4%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 816321
81.6%
1 183679
 
18.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 816321
81.6%
1 183679
 
18.4%

TP_LINGUA
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
0
586152 
1
413848 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0 586152
58.6%
1 413848
41.4%

Length

2024-04-15T00:29:21.804391image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:21.884464image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
0 586152
58.6%
1 413848
41.4%

Most occurring characters

ValueCountFrequency (%)
0 586152
58.6%
1 413848
41.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 586152
58.6%
1 413848
41.4%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 586152
58.6%
1 413848
41.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 586152
58.6%
1 413848
41.4%

Q001
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
3
544956 
2
256613 
1
198431 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row3
3rd row3
4th row2
5th row3

Common Values

ValueCountFrequency (%)
3 544956
54.5%
2 256613
25.7%
1 198431
 
19.8%

Length

2024-04-15T00:29:21.976052image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:22.058127image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
3 544956
54.5%
2 256613
25.7%
1 198431
 
19.8%

Most occurring characters

ValueCountFrequency (%)
3 544956
54.5%
2 256613
25.7%
1 198431
 
19.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 544956
54.5%
2 256613
25.7%
1 198431
 
19.8%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 544956
54.5%
2 256613
25.7%
1 198431
 
19.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 544956
54.5%
2 256613
25.7%
1 198431
 
19.8%

Q002
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
2
582136 
3
293473 
1
124391 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row3
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 582136
58.2%
3 293473
29.3%
1 124391
 
12.4%

Length

2024-04-15T00:29:22.160219image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:22.254305image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
2 582136
58.2%
3 293473
29.3%
1 124391
 
12.4%

Most occurring characters

ValueCountFrequency (%)
2 582136
58.2%
3 293473
29.3%
1 124391
 
12.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 582136
58.2%
3 293473
29.3%
1 124391
 
12.4%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 582136
58.2%
3 293473
29.3%
1 124391
 
12.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 582136
58.2%
3 293473
29.3%
1 124391
 
12.4%

Q003
Real number (ℝ)

HIGH CORRELATION 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.110358
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.3 MiB
2024-04-15T00:29:22.343386image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q34
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.4568477
Coefficient of variation (CV)0.46838586
Kurtosis-0.7696353
Mean3.110358
Median Absolute Deviation (MAD)1
Skewness0.22127323
Sum3110358
Variance2.1224052
MonotonicityNot monotonic
2024-04-15T00:29:22.444504image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
4 238861
23.9%
3 233517
23.4%
2 192304
19.2%
1 169816
17.0%
5 93073
 
9.3%
6 72429
 
7.2%
ValueCountFrequency (%)
1 169816
17.0%
2 192304
19.2%
3 233517
23.4%
4 238861
23.9%
5 93073
 
9.3%
6 72429
 
7.2%
ValueCountFrequency (%)
6 72429
 
7.2%
5 93073
 
9.3%
4 238861
23.9%
3 233517
23.4%
2 192304
19.2%
1 169816
17.0%

Q004
Real number (ℝ)

HIGH CORRELATION 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.999099
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size15.3 MiB
2024-04-15T00:29:22.539591image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q34
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.4617005
Coefficient of variation (CV)0.48737987
Kurtosis-0.82931442
Mean2.999099
Median Absolute Deviation (MAD)1
Skewness0.44834455
Sum2999099
Variance2.1365683
MonotonicityNot monotonic
2024-04-15T00:29:22.638682image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
2 379187
37.9%
4 285647
28.6%
1 132872
 
13.3%
6 74493
 
7.4%
5 67452
 
6.7%
3 60349
 
6.0%
ValueCountFrequency (%)
1 132872
 
13.3%
2 379187
37.9%
3 60349
 
6.0%
4 285647
28.6%
5 67452
 
6.7%
6 74493
 
7.4%
ValueCountFrequency (%)
6 74493
 
7.4%
5 67452
 
6.7%
4 285647
28.6%
3 60349
 
6.0%
2 379187
37.9%
1 132872
 
13.3%

Q005
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
4
584479 
3
281161 
2
114728 
1
 
19632

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row4
3rd row2
4th row3
5th row4

Common Values

ValueCountFrequency (%)
4 584479
58.4%
3 281161
28.1%
2 114728
 
11.5%
1 19632
 
2.0%

Length

2024-04-15T00:29:22.752785image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:22.853877image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
4 584479
58.4%
3 281161
28.1%
2 114728
 
11.5%
1 19632
 
2.0%

Most occurring characters

ValueCountFrequency (%)
4 584479
58.4%
3 281161
28.1%
2 114728
 
11.5%
1 19632
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 584479
58.4%
3 281161
28.1%
2 114728
 
11.5%
1 19632
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4 584479
58.4%
3 281161
28.1%
2 114728
 
11.5%
1 19632
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 584479
58.4%
3 281161
28.1%
2 114728
 
11.5%
1 19632
 
2.0%

Q006
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
2
613086 
3
209566 
4
71142 
5
 
59012
1
 
47194

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 613086
61.3%
3 209566
 
21.0%
4 71142
 
7.1%
5 59012
 
5.9%
1 47194
 
4.7%

Length

2024-04-15T00:29:22.950470image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:23.041553image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
2 613086
61.3%
3 209566
 
21.0%
4 71142
 
7.1%
5 59012
 
5.9%
1 47194
 
4.7%

Most occurring characters

ValueCountFrequency (%)
2 613086
61.3%
3 209566
 
21.0%
4 71142
 
7.1%
5 59012
 
5.9%
1 47194
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 613086
61.3%
3 209566
 
21.0%
4 71142
 
7.1%
5 59012
 
5.9%
1 47194
 
4.7%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 613086
61.3%
3 209566
 
21.0%
4 71142
 
7.1%
5 59012
 
5.9%
1 47194
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 613086
61.3%
3 209566
 
21.0%
4 71142
 
7.1%
5 59012
 
5.9%
1 47194
 
4.7%

Q007
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
1
899095 
2
 
54734
3
 
46171

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 899095
89.9%
2 54734
 
5.5%
3 46171
 
4.6%

Length

2024-04-15T00:29:23.140643image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:23.227722image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
1 899095
89.9%
2 54734
 
5.5%
3 46171
 
4.6%

Most occurring characters

ValueCountFrequency (%)
1 899095
89.9%
2 54734
 
5.5%
3 46171
 
4.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 899095
89.9%
2 54734
 
5.5%
3 46171
 
4.6%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 899095
89.9%
2 54734
 
5.5%
3 46171
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 899095
89.9%
2 54734
 
5.5%
3 46171
 
4.6%

Q008
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
2
591320 
3
402856 
1
 
5824

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row3
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 591320
59.1%
3 402856
40.3%
1 5824
 
0.6%

Length

2024-04-15T00:29:23.317805image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:23.403884image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
2 591320
59.1%
3 402856
40.3%
1 5824
 
0.6%

Most occurring characters

ValueCountFrequency (%)
2 591320
59.1%
3 402856
40.3%
1 5824
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 591320
59.1%
3 402856
40.3%
1 5824
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 591320
59.1%
3 402856
40.3%
1 5824
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 591320
59.1%
3 402856
40.3%
1 5824
 
0.6%

Q009
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
3
902075 
2
92585 
1
 
5340

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row3
3rd row3
4th row2
5th row3

Common Values

ValueCountFrequency (%)
3 902075
90.2%
2 92585
 
9.3%
1 5340
 
0.5%

Length

2024-04-15T00:29:23.502476image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:23.584552image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
3 902075
90.2%
2 92585
 
9.3%
1 5340
 
0.5%

Most occurring characters

ValueCountFrequency (%)
3 902075
90.2%
2 92585
 
9.3%
1 5340
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 902075
90.2%
2 92585
 
9.3%
1 5340
 
0.5%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 902075
90.2%
2 92585
 
9.3%
1 5340
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 902075
90.2%
2 92585
 
9.3%
1 5340
 
0.5%

Q010
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
1
448824 
2
422128 
3
129048 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 448824
44.9%
2 422128
42.2%
3 129048
 
12.9%

Length

2024-04-15T00:29:23.679637image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:23.770720image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
1 448824
44.9%
2 422128
42.2%
3 129048
 
12.9%

Most occurring characters

ValueCountFrequency (%)
1 448824
44.9%
2 422128
42.2%
3 129048
 
12.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 448824
44.9%
2 422128
42.2%
3 129048
 
12.9%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 448824
44.9%
2 422128
42.2%
3 129048
 
12.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 448824
44.9%
2 422128
42.2%
3 129048
 
12.9%

Q011
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
1
743468 
2
226976 
3
 
29556

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row1
4th row1
5th row2

Common Values

ValueCountFrequency (%)
1 743468
74.3%
2 226976
 
22.7%
3 29556
 
3.0%

Length

2024-04-15T00:29:23.902841image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:24.007946image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
1 743468
74.3%
2 226976
 
22.7%
3 29556
 
3.0%

Most occurring characters

ValueCountFrequency (%)
1 743468
74.3%
2 226976
 
22.7%
3 29556
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 743468
74.3%
2 226976
 
22.7%
3 29556
 
3.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 743468
74.3%
2 226976
 
22.7%
3 29556
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 743468
74.3%
2 226976
 
22.7%
3 29556
 
3.0%

Q012
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
2
924290 
3
 
64210
1
 
11500

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 924290
92.4%
3 64210
 
6.4%
1 11500
 
1.1%

Length

2024-04-15T00:29:24.113042image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:24.206127image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
2 924290
92.4%
3 64210
 
6.4%
1 11500
 
1.1%

Most occurring characters

ValueCountFrequency (%)
2 924290
92.4%
3 64210
 
6.4%
1 11500
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 924290
92.4%
3 64210
 
6.4%
1 11500
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 924290
92.4%
3 64210
 
6.4%
1 11500
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 924290
92.4%
3 64210
 
6.4%
1 11500
 
1.1%

Q013
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
1
494365 
2
460624 
3
 
45011

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row1
4th row1
5th row2

Common Values

ValueCountFrequency (%)
1 494365
49.4%
2 460624
46.1%
3 45011
 
4.5%

Length

2024-04-15T00:29:24.313224image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:24.407311image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
1 494365
49.4%
2 460624
46.1%
3 45011
 
4.5%

Most occurring characters

ValueCountFrequency (%)
1 494365
49.4%
2 460624
46.1%
3 45011
 
4.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 494365
49.4%
2 460624
46.1%
3 45011
 
4.5%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 494365
49.4%
2 460624
46.1%
3 45011
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 494365
49.4%
2 460624
46.1%
3 45011
 
4.5%

Q014
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
2
661972 
1
324563 
3
 
13465

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row1
4th row1
5th row2

Common Values

ValueCountFrequency (%)
2 661972
66.2%
1 324563
32.5%
3 13465
 
1.3%

Length

2024-04-15T00:29:24.515937image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:24.611024image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
2 661972
66.2%
1 324563
32.5%
3 13465
 
1.3%

Most occurring characters

ValueCountFrequency (%)
2 661972
66.2%
1 324563
32.5%
3 13465
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 661972
66.2%
1 324563
32.5%
3 13465
 
1.3%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 661972
66.2%
1 324563
32.5%
3 13465
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 661972
66.2%
1 324563
32.5%
3 13465
 
1.3%

Q015
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
1
855613 
2
142501 
3
 
1886

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 855613
85.6%
2 142501
 
14.3%
3 1886
 
0.2%

Length

2024-04-15T00:29:24.716120image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:24.812206image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
1 855613
85.6%
2 142501
 
14.3%
3 1886
 
0.2%

Most occurring characters

ValueCountFrequency (%)
1 855613
85.6%
2 142501
 
14.3%
3 1886
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 855613
85.6%
2 142501
 
14.3%
3 1886
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 855613
85.6%
2 142501
 
14.3%
3 1886
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 855613
85.6%
2 142501
 
14.3%
3 1886
 
0.2%

Q016
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
2
538141 
1
453037 
3
 
8822

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
2 538141
53.8%
1 453037
45.3%
3 8822
 
0.9%

Length

2024-04-15T00:29:24.915302image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:24.996896image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
2 538141
53.8%
1 453037
45.3%
3 8822
 
0.9%

Most occurring characters

ValueCountFrequency (%)
2 538141
53.8%
1 453037
45.3%
3 8822
 
0.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 538141
53.8%
1 453037
45.3%
3 8822
 
0.9%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 538141
53.8%
1 453037
45.3%
3 8822
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 538141
53.8%
1 453037
45.3%
3 8822
 
0.9%

Q017
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
1
959523 
2
 
39748
3
 
729

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 959523
96.0%
2 39748
 
4.0%
3 729
 
0.1%

Length

2024-04-15T00:29:25.080978image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:25.157047image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
1 959523
96.0%
2 39748
 
4.0%
3 729
 
0.1%

Most occurring characters

ValueCountFrequency (%)
1 959523
96.0%
2 39748
 
4.0%
3 729
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 959523
96.0%
2 39748
 
4.0%
3 729
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 959523
96.0%
2 39748
 
4.0%
3 729
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 959523
96.0%
2 39748
 
4.0%
3 729
 
0.1%

Q018
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
0
710781 
1
289219 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 710781
71.1%
1 289219
28.9%

Length

2024-04-15T00:29:25.238121image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:25.316191image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
0 710781
71.1%
1 289219
28.9%

Most occurring characters

ValueCountFrequency (%)
0 710781
71.1%
1 289219
28.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 710781
71.1%
1 289219
28.9%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 710781
71.1%
1 289219
28.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 710781
71.1%
1 289219
28.9%

Q019
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
2
614539 
3
334715 
1
 
50746

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 614539
61.5%
3 334715
33.5%
1 50746
 
5.1%

Length

2024-04-15T00:29:25.393262image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:25.468836image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
2 614539
61.5%
3 334715
33.5%
1 50746
 
5.1%

Most occurring characters

ValueCountFrequency (%)
2 614539
61.5%
3 334715
33.5%
1 50746
 
5.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 614539
61.5%
3 334715
33.5%
1 50746
 
5.1%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 614539
61.5%
3 334715
33.5%
1 50746
 
5.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 614539
61.5%
3 334715
33.5%
1 50746
 
5.1%

Q020
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
0
811721 
1
188279 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 811721
81.2%
1 188279
 
18.8%

Length

2024-04-15T00:29:25.793774image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:25.866841image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
0 811721
81.2%
1 188279
 
18.8%

Most occurring characters

ValueCountFrequency (%)
0 811721
81.2%
1 188279
 
18.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 811721
81.2%
1 188279
 
18.8%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 811721
81.2%
1 188279
 
18.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 811721
81.2%
1 188279
 
18.8%

Q021
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
0
742535 
1
257465 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 742535
74.3%
1 257465
 
25.7%

Length

2024-04-15T00:29:25.948421image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:26.020487image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
0 742535
74.3%
1 257465
 
25.7%

Most occurring characters

ValueCountFrequency (%)
0 742535
74.3%
1 257465
 
25.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 742535
74.3%
1 257465
 
25.7%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 742535
74.3%
1 257465
 
25.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 742535
74.3%
1 257465
 
25.7%

Q022
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
3
840786 
2
139644 
1
 
19570

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row3
3rd row3
4th row3
5th row3

Common Values

ValueCountFrequency (%)
3 840786
84.1%
2 139644
 
14.0%
1 19570
 
2.0%

Length

2024-04-15T00:29:26.099559image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:26.172625image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
3 840786
84.1%
2 139644
 
14.0%
1 19570
 
2.0%

Most occurring characters

ValueCountFrequency (%)
3 840786
84.1%
2 139644
 
14.0%
1 19570
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 840786
84.1%
2 139644
 
14.0%
1 19570
 
2.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 840786
84.1%
2 139644
 
14.0%
1 19570
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 840786
84.1%
2 139644
 
14.0%
1 19570
 
2.0%

Q023
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
0
858584 
1
141416 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 858584
85.9%
1 141416
 
14.1%

Length

2024-04-15T00:29:26.255701image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:26.327854image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
0 858584
85.9%
1 141416
 
14.1%

Most occurring characters

ValueCountFrequency (%)
0 858584
85.9%
1 141416
 
14.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 858584
85.9%
1 141416
 
14.1%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 858584
85.9%
1 141416
 
14.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 858584
85.9%
1 141416
 
14.1%

Q024
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
1
405000 
2
403871 
3
191129 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row1
4th row1
5th row2

Common Values

ValueCountFrequency (%)
1 405000
40.5%
2 403871
40.4%
3 191129
19.1%

Length

2024-04-15T00:29:26.405926image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:26.482498image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
1 405000
40.5%
2 403871
40.4%
3 191129
19.1%

Most occurring characters

ValueCountFrequency (%)
1 405000
40.5%
2 403871
40.4%
3 191129
19.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 405000
40.5%
2 403871
40.4%
3 191129
19.1%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 405000
40.5%
2 403871
40.4%
3 191129
19.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 405000
40.5%
2 403871
40.4%
3 191129
19.1%

Q025
Categorical

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size15.3 MiB
1
923101 
0
 
76899

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1000000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row0
5th row1

Common Values

ValueCountFrequency (%)
1 923101
92.3%
0 76899
 
7.7%

Length

2024-04-15T00:29:26.566574image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-15T00:29:26.637650image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
ValueCountFrequency (%)
1 923101
92.3%
0 76899
 
7.7%

Most occurring characters

ValueCountFrequency (%)
1 923101
92.3%
0 76899
 
7.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1000000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 923101
92.3%
0 76899
 
7.7%

Most occurring scripts

ValueCountFrequency (%)
Common 1000000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 923101
92.3%
0 76899
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1000000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 923101
92.3%
0 76899
 
7.7%

MEDIAS
Real number (ℝ)

Distinct45208
Distinct (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean546.88478
Minimum0
Maximum855.98
Zeros8
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size15.3 MiB
2024-04-15T00:29:26.727732image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile406.06
Q1487.66
median544.08
Q3605.84
95-th percentile696.14
Maximum855.98
Range855.98
Interquartile range (IQR)118.18

Descriptive statistics

Standard deviation88.021571
Coefficient of variation (CV)0.16095085
Kurtosis-0.032176653
Mean546.88478
Median Absolute Deviation (MAD)58.9
Skewness0.024465885
Sum5.4688478 × 108
Variance7747.7969
MonotonicityNot monotonic
2024-04-15T00:29:26.835831image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
534.74 120
 
< 0.1%
542.7 118
 
< 0.1%
520.7 117
 
< 0.1%
562.8 117
 
< 0.1%
512.76 117
 
< 0.1%
518.28 116
 
< 0.1%
514.72 116
 
< 0.1%
533.34 116
 
< 0.1%
530.6 116
 
< 0.1%
552.3 115
 
< 0.1%
Other values (45198) 998832
99.9%
ValueCountFrequency (%)
0 8
< 0.1%
56.14 1
 
< 0.1%
66.1 1
 
< 0.1%
72.12 1
 
< 0.1%
89.12 1
 
< 0.1%
116 1
 
< 0.1%
127.12 1
 
< 0.1%
131.66 1
 
< 0.1%
136.24 1
 
< 0.1%
136.44 1
 
< 0.1%
ValueCountFrequency (%)
855.98 1
< 0.1%
855.82 1
< 0.1%
851.84 1
< 0.1%
849.86 1
< 0.1%
839.98 1
< 0.1%
839.54 1
< 0.1%
837.56 1
< 0.1%
837.06 1
< 0.1%
836.84 1
< 0.1%
836.76 1
< 0.1%

Interactions

2024-04-15T00:29:14.369236image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:09.428806image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:10.391410image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:11.283861image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:12.500803image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:13.398403image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:14.561918image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:09.600054image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:10.545094image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:11.478838image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:12.660042image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:13.552052image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:14.735282image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:09.757693image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:10.692921image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:11.886914image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:12.800640image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:13.708707image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:14.912982image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:09.934113image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:10.850456image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:12.049143image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:12.958005image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:13.864855image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:15.092919image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:10.088129image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:10.993548image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:12.202345image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:13.101143image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:14.039275image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:15.252573image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:10.241789image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:11.142240image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:12.353569image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:13.253787image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
2024-04-15T00:29:14.197426image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/

Correlations

2024-04-15T00:29:26.935426image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
IN_TREINEIROMEDIASQ001Q002Q003Q004Q005Q006Q007Q008Q009Q010Q011Q012Q013Q014Q015Q016Q017Q018Q019Q020Q021Q022Q023Q024Q025TP_ANO_CONCLUIUTP_COR_RACATP_ESCOLATP_ESTADO_CIVILTP_FAIXA_ETARIATP_LINGUATP_NACIONALIDADETP_SEXOTP_ST_CONCLUSAO
IN_TREINEIRO1.0000.0060.1400.1700.1080.1210.0840.1690.1160.1430.0770.1550.0270.0680.0850.0820.0460.0930.0620.1050.1160.0680.1150.0680.0430.1240.052-0.358-0.0690.3880.087-0.5910.1110.0130.0361.000
MEDIAS0.0061.0000.2200.2480.2960.2990.0240.2250.1560.2270.0870.2300.0510.0930.1860.1770.0870.1850.1100.2900.1990.1330.2020.1400.1650.3070.1900.070-0.2310.1900.039-0.0710.2780.0340.0570.063
Q0010.1400.2201.0000.4020.4460.3910.0520.2940.1270.2360.1160.2440.0470.1020.1750.2280.0960.2060.0940.2980.2150.1180.2430.1620.1610.2680.209-0.153-0.1890.1480.104-0.2600.2360.0230.0580.134
Q0020.1700.2480.4021.0000.3760.5240.0580.3580.1990.2670.1260.2760.0290.1170.1720.2120.1080.2010.1320.3060.2200.1360.2730.1610.1610.3100.204-0.161-0.1930.1800.128-0.2870.2280.0240.0570.154
Q0030.1080.2960.4460.3761.0000.5370.0440.3400.2840.3200.1160.3370.1050.1630.2250.2790.1470.2640.1960.3890.2870.1920.3490.1820.2200.3680.247-0.113-0.1930.2190.044-0.1840.2750.0210.0630.105
Q0040.1210.2990.3910.5240.5371.0000.0400.3140.2820.3030.1150.3190.0890.1500.2110.2750.1360.2540.1770.3550.2650.1700.3250.1830.1980.3510.253-0.121-0.1930.2010.049-0.2050.2560.0200.0620.108
Q0050.0840.0240.0520.0580.0440.0401.0000.0780.0500.0950.2160.1420.0760.0730.0660.0650.0280.0560.0260.0730.1350.0590.0760.2350.0610.0770.055-0.1350.0370.0860.078-0.1470.0450.0070.0290.106
Q0060.1690.2250.2940.3580.3400.3140.0781.0000.3610.3800.1680.4360.0670.2200.2800.3050.1910.3000.2440.4970.3540.2500.4340.2160.2620.4440.276-0.105-0.2650.2420.020-0.2170.2790.0260.0860.108
Q0070.1160.1560.1270.1990.2840.2820.0500.3611.0000.2110.0530.2620.0390.1720.1780.1290.1280.1570.1980.2530.2010.1650.2830.0550.1520.2380.067-0.096-0.1410.1600.022-0.1440.1440.0150.0360.093
Q0080.1430.2270.2360.2670.3200.3030.0950.3800.2111.0000.2280.3590.0380.2370.2390.2820.1430.2700.1380.4000.3360.1890.3340.1960.2120.3380.248-0.115-0.2100.1720.037-0.2040.2200.0260.0590.118
Q0090.0770.0870.1160.1260.1160.1150.2160.1680.0530.2281.0000.1950.0690.1550.1190.1550.0640.1370.0390.1620.1890.0940.1410.2450.0840.1470.240-0.111-0.0870.0620.063-0.1650.0940.0090.0190.087
Q0100.1550.2300.2440.2760.3370.3190.1420.4360.2620.3590.1951.0000.0460.2230.2880.3360.1730.3200.1760.4780.3310.2130.3590.2080.2370.3780.265-0.157-0.2660.1730.036-0.2520.2370.0260.0570.142
Q0110.0270.0510.0470.0290.1050.0890.0760.0670.0390.0380.0690.0461.0000.0390.0390.0480.0290.0390.0330.0640.0490.0330.0500.0570.0700.0540.052-0.0410.0570.0550.017-0.0470.0750.0110.0310.032
Q0120.0680.0930.1020.1170.1630.1500.0730.2200.1720.2370.1550.2230.0391.0000.3610.1820.1120.2110.1290.2340.2330.1600.2070.1330.1410.1840.187-0.076-0.1150.0850.022-0.1090.1050.0100.0360.065
Q0130.0850.1860.1750.1720.2250.2110.0660.2800.1780.2390.1190.2880.0390.3611.0000.2840.2020.2830.1380.3650.2570.2130.2900.1680.1850.2710.210-0.126-0.2060.1120.061-0.1830.2020.0180.0300.097
Q0140.0820.1770.2280.2120.2790.2750.0650.3050.1290.2820.1550.3360.0480.1820.2841.0000.2640.3470.1230.3940.2810.1660.2970.2060.2070.3190.291-0.113-0.2390.1190.019-0.1620.2230.0230.0630.093
Q0150.0460.0870.0960.1080.1470.1360.0280.1910.1280.1430.0640.1730.0290.1120.2020.2641.0000.1630.1920.2530.1500.1450.2030.0760.1000.1730.097-0.076-0.1140.0750.023-0.0980.1000.0110.0270.057
Q0160.0930.1850.2060.2010.2640.2540.0560.3000.1570.2700.1370.3200.0390.2110.2830.3470.1631.0000.1540.4130.2990.1920.3020.1780.2130.3090.263-0.107-0.2450.1300.024-0.1670.2240.0230.0570.094
Q0170.0620.1100.0940.1320.1960.1770.0260.2440.1980.1380.0390.1760.0330.1290.1380.1230.1920.1541.0000.2360.1310.1470.1860.0370.1260.1790.053-0.057-0.1180.1090.011-0.0860.1090.0150.0400.052
Q0180.1050.2900.2980.3060.3890.3550.0730.4970.2530.4000.1620.4780.0640.2340.3650.3940.2530.4130.2361.0000.4320.2550.3390.1950.2400.4830.173-0.113-0.2640.2200.033-0.1750.2400.0360.0520.142
Q0190.1160.1990.2150.2200.2870.2650.1350.3540.2010.3360.1890.3310.0490.2330.2570.2810.1500.2990.1310.4321.0000.2530.3860.2100.2450.3330.234-0.122-0.2140.1670.046-0.1900.2260.0280.0790.108
Q0200.0680.1330.1180.1360.1920.1700.0590.2500.1650.1890.0940.2130.0330.1600.2130.1660.1450.1920.1470.2550.2531.0000.2160.1180.1780.2550.080-0.080-0.1040.1130.034-0.1090.1140.0140.0180.095
Q0210.1150.2020.2430.2730.3490.3250.0760.4340.2830.3340.1410.3590.0500.2070.2900.2970.2030.3020.1860.3390.3860.2161.0000.1680.2310.3750.151-0.124-0.1660.2070.042-0.1770.1720.0220.0220.150
Q0220.0680.1400.1620.1610.1820.1830.2350.2160.0550.1960.2450.2080.0570.1330.1680.2060.0760.1780.0370.1950.2100.1180.1681.0000.0880.1990.333-0.083-0.1140.0670.046-0.1510.1360.0140.0200.075
Q0230.0430.1650.1610.1610.2200.1980.0610.2620.1520.2120.0840.2370.0700.1410.1850.2070.1000.2130.1260.2400.2450.1780.2310.0881.0000.2620.104-0.033-0.1240.1330.025-0.0660.1350.0210.0320.051
Q0240.1240.3070.2680.3100.3680.3510.0770.4440.2380.3380.1470.3780.0540.1840.2710.3190.1730.3090.1790.4830.3330.2550.3750.1990.2621.0000.304-0.037-0.2610.1990.023-0.1370.2800.0350.0950.094
Q0250.0520.1900.2090.2040.2470.2530.0550.2760.0670.2480.2400.2650.0520.1870.2100.2910.0970.2630.0530.1730.2340.0800.1510.3330.1040.3041.000-0.046-0.1280.0840.018-0.0970.1310.0170.0330.064
TP_ANO_CONCLUIU-0.3580.070-0.153-0.161-0.113-0.121-0.135-0.105-0.096-0.115-0.111-0.157-0.041-0.076-0.126-0.113-0.076-0.107-0.057-0.113-0.122-0.080-0.124-0.083-0.033-0.037-0.0461.0000.0680.3360.2320.7590.1230.0110.0190.400
TP_COR_RACA-0.069-0.231-0.189-0.193-0.193-0.1930.037-0.265-0.141-0.210-0.087-0.2660.057-0.115-0.206-0.239-0.114-0.245-0.118-0.264-0.214-0.104-0.166-0.114-0.124-0.261-0.1280.0681.0000.1110.0420.1160.1820.0360.0180.068
TP_ESCOLA0.3880.1900.1480.1800.2190.2010.0860.2420.1600.1720.0620.1730.0550.0850.1120.1190.0750.1300.1090.2200.1670.1130.2070.0670.1330.1990.0840.3360.1111.0000.096-0.3130.1350.0200.0490.707
TP_ESTADO_CIVIL0.0870.0390.1040.1280.0440.0490.0780.0200.0220.0370.0630.0360.0170.0220.0610.0190.0230.0240.0110.0330.0460.0340.0420.0460.0250.0230.0180.2320.0420.0961.0000.1950.0880.0110.0200.117
TP_FAIXA_ETARIA-0.591-0.071-0.260-0.287-0.184-0.205-0.147-0.217-0.144-0.204-0.165-0.252-0.047-0.109-0.183-0.162-0.098-0.167-0.086-0.175-0.190-0.109-0.177-0.151-0.066-0.137-0.0970.7590.116-0.3130.1951.0000.1820.0110.0330.500
TP_LINGUA0.1110.2780.2360.2280.2750.2560.0450.2790.1440.2200.0940.2370.0750.1050.2020.2230.1000.2240.1090.2400.2260.1140.1720.1360.1350.2800.1310.1230.1820.1350.0880.1821.0000.0340.0960.140
TP_NACIONALIDADE0.0130.0340.0230.0240.0210.0200.0070.0260.0150.0260.0090.0260.0110.0100.0180.0230.0110.0230.0150.0360.0280.0140.0220.0140.0210.0350.0170.0110.0360.0200.0110.0110.0341.0000.0280.011
TP_SEXO0.0360.0570.0580.0570.0630.0620.0290.0860.0360.0590.0190.0570.0310.0360.0300.0630.0270.0570.0400.0520.0790.0180.0220.0200.0320.0950.0330.0190.0180.0490.0200.0330.0960.0281.0000.041
TP_ST_CONCLUSAO1.0000.0630.1340.1540.1050.1080.1060.1080.0930.1180.0870.1420.0320.0650.0970.0930.0570.0940.0520.1420.1080.0950.1500.0750.0510.0940.0640.4000.0680.7070.1170.5000.1400.0110.0411.000

Missing values

2024-04-15T00:29:15.427740image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-15T00:29:16.872698image/svg+xmlMatplotlib v3.8.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

TP_FAIXA_ETARIATP_SEXOTP_ESTADO_CIVILTP_COR_RACATP_NACIONALIDADETP_ST_CONCLUSAOTP_ANO_CONCLUIUTP_ESCOLAIN_TREINEIROTP_LINGUAQ001Q002Q003Q004Q005Q006Q007Q008Q009Q010Q011Q012Q013Q014Q015Q016Q017Q018Q019Q020Q021Q022Q023Q024Q025MEDIAS
44015610111301102222321221122211101003021484.78
70734130111202013233421332221212102103021540.66
189022320131202013344221232121111102003011522.98
129073330131202012231321221121111102003010549.00
11501620111202003232421231222211102003021618.80
153366930131202001232321332122211102003011606.86
179502661101151003256431232121211103013021522.92
227022851111131003345131231122112102002121637.00
68951631111202012211421232222112102003011603.04
1473162110031181002244321332122212103103121540.66
TP_FAIXA_ETARIATP_SEXOTP_ESTADO_CIVILTP_COR_RACATP_NACIONALIDADETP_ST_CONCLUSAOTP_ANO_CONCLUIUTP_ESCOLAIN_TREINEIROTP_LINGUAQ001Q002Q003Q004Q005Q006Q007Q008Q009Q010Q011Q012Q013Q014Q015Q016Q017Q018Q019Q020Q021Q022Q023Q024Q025MEDIAS
201969040111121003344441332122222113003031615.14
142000650111121003244321332121212102013021696.72
174630091131171011221321232222211102103011609.58
72736561131131001211321231221211102003021542.56
127890331131111003344321331221112102003011539.34
1062964100131161011236421231121211102003131521.12
109129050131121012231221331221212101003011456.28
243573631111202012222121211121112101012011418.74
225488440131101011111321231121111102002011562.22
77669171121141011211421231221111102002010465.26

Duplicate rows

Most frequently occurring

TP_FAIXA_ETARIATP_SEXOTP_ESTADO_CIVILTP_COR_RACATP_NACIONALIDADETP_ST_CONCLUSAOTP_ANO_CONCLUIUTP_ESCOLAIN_TREINEIROTP_LINGUAQ001Q002Q003Q004Q005Q006Q007Q008Q009Q010Q011Q012Q013Q014Q015Q016Q017Q018Q019Q020Q021Q022Q023Q024Q025MEDIAS# duplicates
020111203003322431332122212113013121563.882
120131202002232421231221111102003011505.262
220131202011111411231121111102003011490.262
320131202012211311231121111102002010459.502
420131202013232421231121111102003011529.142